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Abstract 

A  framework  for  computing  shape  statistics  in  general, 
and  average  in  particular,  for  dynamic  shapes  is  introduced 
in  this  paper.  Given  a  metric  d(-,  •)  on  the  set  of  static 
shapes,  the  empirical  mean  ofN  static  shapes, C±, . . . ,  Cn> 
is  defined  by  arg  minc  ^  d(C,Ci)2.  The  purpose  of 

this  paper  is  to  extend  this  shape  average  work  to  the  case 
of  N  dynamic  shapes  and  to  give  an  efficient  algorithm  to 
compute  it.  The  key  concept  is  to  combine  the  static  shape 
statistics  approach  with  a  time -alignment  step.  To  align  the 
time  scale  while  performing  the  shape  average  we  use  dy¬ 
namic  time  warping,  adapted  to  deal  with  dynamic  shapes. 
The  proposed  technique  is  independent  of  the  particular 
choice  of  the  shape  metric  d(-,  •).  We  present  the  underlying 
concepts,  a  number  of  examples,  and  conclude  with  a  vari¬ 
ational  formulation  to  address  the  dynamic  shape  average 
problem.  We  also  demonstrate  how  to  use  these  results  for 
comparing  different  types  of  dynamics.  Although  only  aver¬ 
age  is  addressed  in  this  paper,  other  shape  statistics  can  be 
similarly  obtained  following  the  framework  here  proposed. 


1.  Introduction 

Understanding  shape  and  its  basic  empirical  statistics  is 
important  both  in  recognition  and  analysis,  with  applica¬ 
tions  ranging  from  medicine  to  security  to  consumer  pho¬ 
tography.  The  basic  metrics  and  statistics  of  static  shapes 
have  been  the  subject  of  numerous  fundamental  studies  in 
recent  years,  see  for  example  [1,  3,  4,  6,  8,  13,  15,  16]  and 
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references  therein.  In  particular,  given  a  metric  on  the  set  of 
static  shapes  (distance  between  two  samples),  the  empirical 
mean  shape  of  N  static  shapes,  as  well  as  other  basic  statis¬ 
tics,  can  be  defined  and  computed.  These  are  then  used  for 
diverse  shape  studies,  from  the  recognition  of  particular  ob¬ 
jects  to  the  detection  of  abnormalities  in  medical  data.  The 
purpose  of  this  work  is  to  extend  this  to  dynamic  shapes. 
This  is  fundamental  for  studies  such  as  those  involving  gait, 
behavior,  growth  patterns,  and  all  problems  involving  mo¬ 
tion,  deformations,  and  time-varying  shapes. 

Given  N  dynamic  shapes  Ti(t), . . . ,  T/v(£)  (t  stands  for 
the  time  parameter,  see  Figure  1),  we  want  to  find  M(t), 
the  empirical  mean  of  these  shapes.  This  basic  computation 
will  be  used  throughout  this  paper  as  an  example  of  how  to 
perform  statistics  on  dynamic  shapes.  One  idea  could  be  to 
simply  perform  static  average  among  Ti(ti), . . . , 
for  each  time  instance  U,  process  that  is  clearly  not  effi¬ 
cient  for  every  king  of  data.  Indeed,  the  initial  shape  se¬ 
ries  might  not  be  time-aligned  (e.g.,  due  to  different  growth 
rates  in  medical  applications  and  different  motion  speeds 
in  gait  analysis).  The  dynamic  shapes  need  to  be  properly 
aligned  before  any  kind  of  shape  statistics  technique  is  ap¬ 
plied.1  This  is  exactly  the  role  of  the  dynamic  time  warping 
( DTW ),  see  Figure  1.  This  process  is  commonly  used  in 
speech  recognition  in  order  to  time-align  speech  patterns  to 
account  for  differences  in  speaking  rates  across  speakers.  It 
has  also  been  used  by  a  number  of  authors  for  gait  analysis, 
but  limited  to  the  ID  path  obtained  by  the  tracking  of  partic¬ 
ular  joints.  In  this  work  we  propose  to  combine  DTW  with 
results  on  static  shape  analysis  to  compute  basic  statistics 
on  dynamic  shapes.  The  framework  here  proposed  is  inde¬ 
pendent  of  the  particular  choice  of  static  shape  metric.  This 
work  deals  with  discrete  time  instances,  while  the  extension 
to  a  continuous  framework  is  discussed  in  the  conclusions 
section. 


^he  topic  of  time  alignment  appears  also  in  video  (see  for  example 
[2]  and  references  in  there).  The  goals  and  techniques  used  there  are  com¬ 
pletely  different  from  the  ones  here  presented.  The  use  of  our  proposed 
framework  for  video  alignment  is  the  subject  of  future  research. 
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2.  Static  Shape  Averaging 

In  order  to  compute  the  mean  of  N  static  shapes,  a  dis¬ 
tance  on  the  set  of  the  shapes  is  necessary  (for  examples 
of  such  metrics  see  [3,  7,  13,  14]  and  references  therein). 
Once  the  metric  is  given,  the  empirical  mean  shape  can  be 
defined: 

Definition  1  Let  d(-,  •)  be  a  distance  on  the  set  of  shapes 
and  Ci, . . . ,  Cjv,  N  static  shapes.  The  empirical  mean 
M(C\ , . . . ,  Cn )  is  given  by 

N 

M(Ci, . . . ,  Cjv)  —  arg  minV^  d[C,Cf)2 
c 

The  goal  of  this  paper  is  to  present  a  natural  and  easy 
to  compute  extension  of  this  definition  for  dynamic  shapes. 
Before  doing  this,  let  us  briefly  recall  the  second  fundamen¬ 
tal  component  of  our  approach,  dynamic  time  warping. 

3.  Dynamic  Time  Warping 

Dynamic  time  warping  (DTW)  is  principally  used  in 
speech  recognition  to  time-align  speech  patterns  in  order 
to  account  for  differences  in  speaking  rates  across  speakers. 
A  distance  between  two  speech  patterns  can  then  be  com¬ 
puted  by  this  technique  in  order  to  be  able  to  compare  them 
(see  [9]).  DTW  can  be  adapted  to  deal  with  other  types  of 
signals  as  done  in  this  paper  for  shapes. 

When  the  two  signals  (A,  B)  to  be  matched  are  de¬ 
fined  as  sampled  time  functions,  A  =  ai, . . . ,  a^;  T?  = 
b i, ,  bM,  the  basic  problem  in  DTW  is  to  find  two  time 
warping  functions  f  and  g  such  that 

T 

bg(t)) 

t= 1 

is  minimized  (here  d(-,  •)  stands  for  the  function  measuring 
the  discrepancy  between  two  samples). 

Computing  these  warping  functions  can  be  viewed  as  the 
process  of  finding  a  minimum-cost  path  through  the  lattice 
of  points  (ai,  starting  from  (1, 1) 

and  ending  at  (L,  M)  (see  Figure  2), 2  where  the  cost  of  a 
path  is  defined  by: 

T 

D{f ,  g)  ~^2d(af(t),bg(t))2 

t= 1 

and  /  and  g  are  subject  to  the  following  constraints: 

2Note  that  we  use  a$  and  bi  both  to  denote  the  time  positions  and  their 
corresponding  values,  the  distinction  clearly  provided  by  the  context. 


1.  /  and  g  must  be  monotonic: 

f{k)  >  f(k  -  1)  and  g(k)  >  g{k  -  1) 

2.  /  and  g  must  match  the  endpoints  of  A  and  B : 

/( 1)  =  g(  1)  =  1,  /cn  =  L  and  g(T)  =  M 

3.  /  and  g  must  not  skip  any  points: 

f(k)  -  f(k  -  1)  <  1  and  g(k)  -  g(k  —  1)  <  1 

4.  A  limit  in  the  maximum  amount  of  warp  is  fixed  by 
\f(k)—g(k)\<Q,  Q  being  the  given  “window  width” 

In  the  example  in  Figure  2,  the  time  warping  functions 
are: 
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At  first  glance,  it  would  seem  as  if  D(f,  g)  would  have 
to  be  evaluated  for  a  prohibitively  large  number  of  possible 
paths.  Fortunately,  dynamic  programming  brings  this  prob¬ 
lem  under  control  by  noting  that  the  best  path  from  (1,1) 
to  any  given  point  is  independent  of  what  happens  beyond 
that  point.  Hence,  if  we  call  D{ik,jk)  the  total  cost  of  the 
best  path  from  (1,1)  to  ( ik,jk),  this  is  the  cost  of  the  point 
(■ ik,jk )  itself  plus  the  cost  of  the  cheapest  path  to  it: 

D{ik,jk)  =  d(ik,jk)2  +  min  D(ik-ujk-i) 
legal(ife— ljfc-i) 

By  the  subscript  “legal  (i/c_i,  jk-i)”  we  mean  the  mini¬ 
mum  over  all  permissible  predecessors  of  (ik,jk)-  By  con¬ 
straints  1  and  3  above,  there  are  only  three  legal  predeces¬ 
sors:  (ik-l,jk),  (ik,jk  —  1)  and  (4-1,  jk-l).  Therefore 
we  need  to  consider  only  three  possibilities  per  lattice  point 
(this  if  further  constrained  by  point  4  above). 

Dynamic  programming  for  solving  the  DTW  problem 
(finding  /  and  g)  then  proceeds  in  incremental  stages  (see 
[9]  for  the  complete  algorithm),  achieving  an  optimal  time 
complexity  of  O(PQ)  ( P  is  the  number  of  initial  frames 
and  Q  the  “window  width”  from  constraint  4).  It  means 
that  we  need  to  compute  d(-,  •),  the  distance  between  two 
static  shapes,  only  0(PQ )  times. 

4.  Dynamic  Shapes  Averaging 

With  the  basic  concepts  on  the  mean  of  static  shapes  and 
dynamic  time  warping,  we  are  now  ready  to  describe  the 
framework  for  dynamic  shape  average. 


4.1  Basic  Idea 


4.2  Basic  Improvements 


We  first  define  a  dynamic  shape  as  a  sequence  of  static 
shapes  (represented  by  any  possible  characterization): 

Definition  2  Let  S  be  a  set  of  static  shapes  ( using  any  ex¬ 
isting  representation).  A  dynamic  shape  V  is  an  ordered 
sequence  of  static  shapes  (Ci, . . . ,  Ct)  £  ST  (T  E  N  is 
the  length  of  the  dynamic  shape). 

Although  the  above  definition  is  given  for  discrete  times, 
it  can  be  extended  to  continuous  space. 

The  idea  now  is  to  combine  dynamic  time  warping  and 
static  shape  averaging: 

Definition  3  Given  a  distance  on  the  set  of  static 

shapes,  andY\{t\ ), . . . ,  T  at(£/v),  TV  dynamic  shapes  of  re¬ 
spective  length  Ti  (i.e.  ti  =  1,...,TJ,  their  empirical 
mean  is  defined  as 

for  1  <  t  <  T  :  T(t)  =  M(Ti(fi(t)), TN(fN(t))) 
where  /i, . . . ,  /at  are  TV  time -warping  functions  given  by 

T 

(fu  ■  ■  ■ ,  /jv)  ~  arg  min  Y]  M(ri(/i  (t)), . . .  ,VN(fN(t))) 

fl’-’fN  (=0 

with 

//(Ci, . . . ,  Cat)  =  E  d(Ci,  Cj), 

1  <i<j<N 

and  M(Ci, . . . ,  Cjv)  is  the  mean  of  static  shapes  (see 
Defl). 

In  words,  we  start  by  finding  (via  DTW)  optimal  time- 
correspondences  between  static  shapes  and  after  that  we 
compute  the  average  of  these  static  shapes  per  time  instance. 
The  warping  is  such  that  the  metric  is  minimized. 

This  definition  suggests  to  consider  the  “distance”  be¬ 
tween  two  dynamic  shapes  as  follows  (there  is  no  triangle 
inequality  here): 


With  the  simple  use  of  DTW,  the  mean  shape’s  length  T 
will  be  greater  than  or  equal  to  the  maximum  of  the  indi¬ 
vidual  time  lengths  {Ti  , . . .  ,tn}.  Therefore,  the  dynamic 
mean  shape  will  always  be  longer  than  the  initial  shapes  (in 
Figure  2,  Ti  =  6,  T2  =  6,  T  =  8).  In  order  to  correct  this, 
we  add  jumps  in  the  final  path. 

When  TV  =  2,  define,  for  i  e  {1,  2}, 

A{(f(7).P(/  |  1))  /,•(/)$/,•(/  I  1)} 


where  T(-)  is  the  dynamic  mean  shape  (from  Definition  3). 
Ei  is  the  set  of  vertical  segments  and  E2  the  set  of  hor¬ 
izontal  segments  in  the  graph  representing  the  final  path. 
In  Figure  2,  Ex  =  {(P(6),  T(7)),  (f(7),  f(8))}  and  E2  = 
{(T(2),  T(3)),  (T(3),  r(4))}.  These  segments  are  responsi¬ 
ble  for  the  increase  of  the  final  length.  Indeed,  we  have  the 
simple  relation  (T  is  the  length  of  the  mean  shape  without 
jumps)  : 

T  =  Ti  +  \Ef  =  T2  +  \E2\ 

We  opt  to  replace  every  second  pair  in  E\  by  its  static  aver¬ 
age  ,  then  we  do  the  same  for  the  pairs  in  E2  (see  Figure  3). 
Each  replacement  decreases  the  length  by  one. 

Therefore  T’,  the  length  of  the  mean  shape  with  the 
jumps,  becomes: 


Tr  =  T  — 


=  7i  + 


|Tj  I  -  \Eo 


rj~\f  _  rji  ,  T2  ~  Tl  _  T2  +  Tl 

T  ~  Tl  +  ~ 

The  length  of  the  final  mean  shape  is  then  the  average  of 
the  length  of  the  two  initial  shapes. 

In  the  general  case  (TV  dynamic  shapes),  it  is  also  intu¬ 
itive  that  we  would  like  the  length  of  the  final  mean  shape 
to  equal  the  average  of  the  lengths  of  the  TV  initial  dynamic 
shapes.  Therefore  we  now  generalize  the  pairing  process 
described  above.  Define,  for  any  A  C  {1, . . . ,  TV}: 


Ea  4{(f  (*),  f  (t  +  1))  |  fi(t)  =  fi(t  +  1)  ^  i  g  A} 


and,  for  any  i  G  {1, . . . ,  TV}: 


Definition  4  Given  two  dynamic  shapes  Ti  and  Y2,  their 
“distance  ”  is  given  by 

mX2)  2y^(ri(/lW), r2(/2(i))), 

t=0 

where  df%  •)  is  the  selected  metric  for  static  shapes  and  fi 
and  f2  are  the  optimal  time  warping  functions. 

This  definition  will  be  used  later  to  compare  human  mo¬ 
tions. 


Ai={A  c  {1,...,N}  |  i  e  A} 

Then  the  following  relations,  where  T  is  the  length  of  the 
mean  shape  without  jumps,  hold  for  any  i  G  {1, . . . ,  TV}  : 

T  =  Ti+YJ  \Ea\ 

AeAi 

and 

1  N  1 

r-jvST-  =  jv  £  W-I^l 

i= 1 


The  right  hand  term  of  the  previous  equality  can  be  elim¬ 
inated  by  the  following  process:  For  every  subset  A  of 
{1, . . . ,  TV},  we  choose  a  number  of  pairs  belong¬ 

ing  to  Ea?  Then  we  replace  each  pair  by  their  static  aver- 
age.The  length  of  the  final  mean  shape  is  then  the  average 
of  the  length  of  the  N  initial  shapes.  Figure  4  shows  the 
mean  shape  for  simple  initial  shapes  and  for  N  =  3. 

5.  Examples 

For  our  experiments, we  represented  a  static  shape  C  by 
its  distance  function  f>(x)  =  min yec  \\x  —  y\\  and  we  used 
the  following  simple  metric  on  the  set  of  static  shapes: 

d(Ci,C2 )  =  J  (V’lM  -^2M)2dw 

where  'ipi(x)  is  the  distance  function  to  the  shape  C{.  For 
this  distance,  C,  the  average  shape,  is  the  zero  level  set  of 

$(x)  =  \  +i’2(x)) 

As  mentioned  in  the  introduction,  the  framework  here 
introduced  is  independent  of  the  particular  choice  of  the 
static  metric  d(-,  •),  and  we  have  selected  this  simple  one 
for  demonstration  purposes  only. 

The  segmentation  of  the  input  pictures  is  done  by  sim¬ 
ple  thresholding  ([12,  10]),  and  the  distance  function  ^  for 
each  shape  C{  is  computed  with  the  fast  marching  method 
(for  details,  see  [5,  11,  17]).  Figure  5  shows  an  example 
(one  frame)  of  an  initial  dynamic  shape. 

In  figures  6  and  8,  we  present  a  number  of  frames  from 
two  initial  video  clips  (dynamic  shapes),  followed  by  sam¬ 
pled  frames  from  the  average  dynamic  shape  computed 
without  using  DTW,  and  finally  sampled  frames  from  the 
average  dynamic  shape  computed  with  our  technique.  Fig¬ 
ures  7  and  9  show  the  corresponding  DTW  graphs. 

In  Figure  10,  we  present  some  frames  from  three  ini¬ 
tial  video  clips  (three  walking  men),  followed  by  sampled 
frames  from  the  mean  dynamic  shape  computed  without  us¬ 
ing  DTW,  and  finally  with  our  technique. 

Using  Definition  4,  we  can  compare  different  dynamics, 
such  as  running  vs.  walking  men.  As  observed  in  the  table 
below,  this  function  is  five  times  greater  between  one  run¬ 
ning  men  and  one  walking  men  than  between  two  running 
or  two  walking  men. 


walk  1 

walk  2 

run  1 

run  2 

walk  1 

0 

1.3 

5.6 

6.3 

walk  2 

1.3 

0 

5.2 

6.7 

run  1 

5.6 

5.2 

0 

1.1 

run  2 

6.3 

6.7 

1.1 

0 

3  We  choose  these  pairs  uniformly  spread  in  time. 


6.  Conclusion 

A  novel  framework  for  performing  shape  statistics  in  dy¬ 
namic  shapes  was  described  in  this  paper.  The  basic  idea 
is  to  combine  shape  alignment  with  previously  developed 
ideas  from  static  shape  studies.  The  shape  alignment  is 
based  on  dynamic  time  warping.  The  framework  is  inde¬ 
pendent  of  the  metric  between  static  shapes. 

A  number  of  directions  are  suggested  by  the  line  of  re¬ 
search  here  initiated.  First  of  all,  other  more  advanced  static 
shape  metrics  need  to  be  used,  including  those  that  incorpo¬ 
rate  landmarks,  found  to  be  fundamental  for  medical  appli¬ 
cations  [15].  Once  these  advanced  metrics  are  incorporated 
into  our  framework,  we  can  proceed  with  more  exhaustive 
experimentation,  including  3D  dynamic  shapes.  Of  partic¬ 
ular  interest  are  the  analysis  and  recognition  of  gait  and  the 
study  of  growth  in  medical  applications. 

In  this  paper  we  limited  ourselves  to  the  case  of  discrete 
time.  In  the  continuous  case,  a  variational  formulation  to 
address  the  dynamic  shape  average  problem  can  be  formu¬ 
lated  as 

argmin  [  V  [d(T(t),Ti(fi(t))2  +H(fi(t)\dt 

r,/i  i<i<N 

where  are  the  time  warping  functions ,  and  H  represents 
some  constraints  on  them  (such  as  continuity,  monotonicity, 
acceleration,  etc).  To  this  we  can  add  time  domain  land¬ 
marks  (e.g.,  by  splitting  the  domain).  These  topics  are  the 
subject  of  current  efforts  in  our  group. 
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Figure  1.  A  simple  example  showing  the  im¬ 
portance  of  time  alignment  when  performing 
shape  statistics.  The  first  two  rows  show 
two  dynamic  shapes  Ti(t)  and  r 2(i).  The  fol¬ 
lowing  two  rows  show  their  mean:  a)  Com¬ 
puted  without  DTW-alignment,  b)  Computed 
with  enhanced  DTW-alignment.  We  clearly 
observe  the  need  for  the  DTW  step. 
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Figure  2.  Dynamic  time  warping  example. 
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Figure  3.  Example  of  the  introduction  of 
jumps  in  DTW  for  N  =  2. 
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Figure  4.  Example  of  mean  shape,  with  jumps, 
for  simple  initial  shapes  and  N  =  3. 


Figure  6.  Example  of  two  walking  men.  The 
two  dynamic  shapes  are  given  first,  followed 
by  the  mean  without  DTW  (third  row),  and  fi¬ 
nally  the  mean  with  DTW  (last  row).  Note  how 
the  lack  of  time  alignment  creates  topological 
errors,  not  present  in  the  average  when  DTW 
is  used. 


Figure  5.  From  left  to  right:  Initial  image,  seg¬ 
mented  shape,  and  distance  function 
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Figure  9.  Graph  corresponding  to  the  DTW  for 
the  hands  sequences. 


Figure  10.  Example  of  three  walking  men.  The 
three  dynamic  shapes  are  given  first,  followed 
by  the  mean  without  DTW  (fourth  row),  and  fi¬ 
nally  the  mean  with  DTW,  our  proposed  tech¬ 
nique  (last  row).  Once  again,  note  the  signif¬ 
icant  improvement  when  the  time-warping  is 
added  to  the  shape  statistics  process. 


